BKTreebank: Building a Vietnamese Dependency Treebank
نویسنده
چکیده
Dependency treebank is an important resource in any language. In this paper, we present our work on building BKTreebank, a dependency treebank for Vietnamese. Important points on designing POS tagset, dependency relations, and annotation guidelines are discussed. We describe experiments on POS tagging and dependency parsing on the treebank. Experimental results show that the treebank is a useful resource for Vietnamese language processing.
منابع مشابه
From Treebank Conversion to Automatic Dependency Parsing for Vietnamese
This paper presents a new conversion method to automatically transform a constituent-based Vietnamese Treebank into dependency trees. On a dependency Treebank created according to our new approach, we examine two stateof-the-art dependency parsers: the MSTParser and the MaltParser. Experiments show that the MSTParser outperforms the MaltParser. To the best of our knowledge, we report the highes...
متن کاملUsing Collaborative Training Method to Build Vietnamese Dependency Treebank
For the difficulty of marking Vietnamese dependency tree, this paper proposed the method which combined MST algorithm and improved Nivre algorithm to build Vietnamese dependency treebank. The method took full advantage of the characteristics of collaborative training. Firstly, we built a bit samples. Secondly, we used the samples to build two weak learners with two fully redundant views. Then, ...
متن کاملProceedings of the Second Asia Pacific International Conference on Information Science and Technology
This paper presents a Vietnamese syntax parsing method by applying PCFG model and improved CYK algorithm. The PCFG model (Probabilistic Context – Free Grammar) has been widely applied for language parsing problems and given a high effect especially for English. In this paper, we propose a model that is applied the PCFG for Vietnamese syntax parsing and an approach for building a set of linguist...
متن کاملUtilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language
The recent success of statistical parsing methods has made treebanks become important resources for building good parsers. However, constructing highquality annotated treebanks is a challenging task. We utilized two publicly available parsers, Berkeley and MST parsers, for feedback on improving the quality of part-of-speech tagging for the Vietnamese Treebank. Analysis of the treebank and parsi...
متن کاملBuilding a Large Syntactically-Annotated Corpus of Vietnamese
Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of li...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.05519 شماره
صفحات -
تاریخ انتشار 2017